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ABSTRACT 

Multidisciplinary  design  optimization  (MDO)  gives  rise  to  nonlinear  optimization  problems 
characterized  by  a  large  number  of  constraints  that  naturally  occur  in  blocks.  We  propose  a 
class  of  multilevel  optimization  methods  motivated  by  the  structure  and  number  of  constraints 
and  by  the  expense  of  the  derivative  computations  for  MDO.  The  algorithms  are  an  extension  to 
the  nonlinear  programming  problem  of  the  successful  class  of  local  Brown-Brent  algorithms  for 
nonlinear  equations.  Our  extensions  allow  the  user  to  partition  constraints  into  arbitrary  blocks  to 
fit  the  application,  and  they  separately  process  each  block  and  the  objective  function,  restricted  to 
certain  subspaces.  The  methods  use  trust  regions  as  a  globalization  startegy,  and  they  have  been 
shown  to  be  globally  convergent  under  reasonable  assumptions.  The  multilevel  algorithms  can  be 
applied  to  all  classes  of  MDO  formulations.  Multilevel  algorithms  for  solving  nonlinear  systems 
of  equations  are  a  special  case  of  the  multilevel  optimization  methods.  In  this  case,  they  can  be 
viewed  as  a  trust-region  globalization  of  the  Brown-Brent  class. 
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1  Introduction 


This  work  is  concerned  with  a  class  of  methods,  called  multilevel  optimization  algorithms,  for 
solving  the  nonlinear  equality  constrained  optimization  problem,  i.e., 

Problem  EQC: 


minimize  f{x) 
subject  to  C{x)  —  0, 

where  /  :  R”  — ♦  R  and  C  :  R”  -+  SI"* ,m  <  n,  are  at  least  twice  continuously  differentiable. 

The  proposed  class  of  algorithms  can  be  used  to  solve  any  general  nonlinear  equality  constrained 
optimization  problem,  but  its  development  has  been  motivated  by  the  engineering  design  problems 
that  give  rise  to  large-scale  optimization  formulations  with  constraints  occuring  naturally  in  blocks. 
In  particular,  in  the  multidisciplinary  design  optimization  (MDO)  environment,  the  sheer  number 
of  constraints,  the  structure  of  the  problems,  and  the  expense  of  the  derivative  computations 
necessitate  the  development  of  flexible  algorithms  that  allow  the  user  to  partition  the  problem  into 
a  set  of  smaller  problems. 

While  there  is  a  number  of  nonlinear  optimization  methods  that  attack  large  problems  by 
decomposing  them  into  several  smaller  ones,  these  methods  require  the  problems  to  have  a  special 
structure,  for  example,  separability  and  convexity. 

In  particular,  in  engineering,  decomposition  and  multilevel  optimization  have  been  used  to 
solve  large  problems  for  some  time.  Sea  [15]  and  [29]  for  a  survey.  The  process  of  decomposition 
and  multilevel  formulation  generally  depends  on  identifying  groups  of  variables  and  constraints 
that  influence  each  other  only  weakly.  The  problem  is  then  decomposed  into  such  weakly  cou¬ 
pled  subproblems  in  various  possible  formulations,  some  hierarchic,  some  nonhierarchic.  Recent 
developments  in  formulations  can  be  found  in  [3]  and  [9].  Some  of  the  approaches  in  [3]  have  been 
proven  to  be  successful  for  many  problems.  In  order  to  be  more  widely  applicable,  it  requires  the 
development  of  theoretical  foundations. 

We  propose  a  class  of  multilevel  optimization  methods  (see  [1]),  for  solving  the  nonlinear  equality 
constrsuned  optimization  problem  characterized  by  the  following  features: 

•  The  constraints  of  the  problem  can  be  partitioned  into  blocks  in  any  manner  suitable  to  an 
application,  or  in  any  arbitrary  manner  at  all.  The  analysis  of  the  methods  assumes  certain 
standard  smoothness  and  boundedness  properties,  but  no  other  assumptions  are  made  on  the 
structure  of  the  problem.  There  is  no  need  to  identify  the  weakly  coupled  groups  of  variables 
and  constraints,  although  that  may  be  helpful  in  practice.  If  all  constraints  and  variables 
are  strongly  coupled,  the  partitioning  can  be  done  according  to  any  other  criterion  useful  to 
a  particular  application,  for  example,  just  the  size  of  constraint  blocks.  The  algorithm  then 
solves  progressively  smaller  dimensional  subproblems  to  arrive  at  the  trial  step. 

•  The  multilevel  methods  belong  to  the  class  of  out-of-core  methods.  To  the  authors’  knowledge, 
the  multilevel  algorithms  are  the  only  algorithms  for  general  nonlinear  optimization  problems 
that  require  only  a  currently  processed  part  of  the  constraints  to  be  held  in  memory.  Thus, 
theoretically,  there  is  no  limit  to  the  size  of  the  problem  the  methods  can  handle. 
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•  The  trial  steps  computed  by  the  algorithm  are  required  to  satisfy  very  mild  conditions,  both 
theoretically  and  computationally.  In  fact,  the  substeps  comprising  the  trial  step  can  be 
computed  in  the  subproblems  using  different  optimization  algorithms.  The  substeps  are 
only  required  to  satisfy  a  mild  decrease  condition  for  the  subproblems  and  a  reasonable 
boundedness  condition — both  satisfied  in  practice  by  most  methods  of  interest.  This  feature 
is  of  great  practical  significance  because  in  applications  like  MDO  various  constraint  blocks 
may  originate  from  different  disciplines  and  may  require  different  approaches  to  solving  the 
subproblems. 

•  The  class  uses  trust  regions  as  a  globalization  strategy.  The  algorithms  are  proven  to  converge 
under  reasonable  assumptions. 

•  The  algorithms  together  with  their  convergence  theory  provide  a  foundation  for  developing 
the  algorithms  and  analyses  of  the  general  multilevel  optimization  formulations. 

The  proposed  multilevel  class  of  algorithms  differs  from  the  conventional  algorithms  in  that  its 
major  iteration  involves  computing  an  approximate  solution  of  not  one  model  over  a  single  restricted 
region,  but  of  a  sweep  of  models,  each  approximately  minimized  over  its  own  restricted  region.  Each 
model  approximates  a  block  of  constraints  and,  finally,  the  objective  function,  restricted  to  certain 
subspaces.  Each  model  is  computed  at  a  different  point.  The  case  of  a  single  block  of  constraints 
is  included. 

In  the  next  section  we  introduce  the  foundations  on  which  the  proposed  class  of  algorithms  rests. 
Section  3  is  devoted  to  the  description  of  the  class.  Section  4  briefly  describes  current  theoretical 
results.  Section  5  concludes  with  a  summary  and  discussion  of  current  and  future  research. 

2  Preliminaries 

The  proposed  class  of  algorithms  may  be  viewed  as  an  extension  of  several  areas  of  research.  In 
this  section  we  describe  the  existing  algorithms  and  analysis  schemes  which  serve  as  a  foundation 
for  the  multUevel  optimization  methods. 

2.1  The  Local  Brown-Brent  Class  of  Methods 

Theoretical  origins  of  this  research  lie  in  the  method  for  solving  nonlinear  systems  of  equations, 
F{x)  =  0,  F  :  li"  -♦  R,  introduced  by  Brown  in  [5],  [6],  [7].In  [4],  Brent  viewed  Brown’s  method 
from  a  different  perspective,  which  allowed  Brent  to  propose  a  class  of  methods,  among  which 
Brown’s  original  method  was  a  special  case.  Gay  [14]  and  Martinez  [23],  [24]  provided  further 
modifications  and  generalizations  of  the  methods. 

The  following  statement  of  the  general  Brown-Brent  algorithm  was  condensed  from  the  de¬ 
scriptions  in  Gay  [14]  and  Dennis  [17].  In  these  works  the  algorithm  is  described  in  terms  of 
one-dimensional  blocks. 

Denote  the  components  of  F{x)  by  F\{x), . . F„(a:). 

Algorithm  2.1  Local  Brown-Brent  Algorithm  for  Nonlinear  Systems 

Let  Xc  be  the  current  approximation  to  the  solution. 
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Outer  Loop:  Do  until  convergence: 

I/O  =  a:c 
5^0  =  »** 

Inner  Loop:  Do  ^  =  1 ,  n 

1.  Form  the  linearization,  Lk  about  yk-i  of  Fk  restricted  to  0^=0 
Lfe  =  0  defines  Hk,  an  (n  —  A:)- dimensional  hyperplane  in  3?". 

2.  Move  from  yk-i  G  0^=0  to  Vk  €  fllLo 

End  Inner  Loop 

Xc  =  Vn 

End  Outer  Loop 

The  point  y„  of  intersection  of  all  the  hyperplanes  is  the  point  where  aU  the  linearizations  vanish. 
The  way  in  which  the  steps  1-2  of  the  inner  loop  are  actually  done  determines  the  particular  kind 
of  Brown-Brent  method.  In  Brent’s  method,  Sk  =  yk  —  yk-i  is  the  shortest  ^2  norm  step  from 
yk-i  to  Hk-  In  Brown’s  method,  Sk  is  the  shortest  norm  step  from  yk-i  to  Hk  parallel  the  A:-th 
coordinate  axis. 

When  applied  to  a  linear  system  of  equations,  i.e.,  when  F{x)  =  Ax  -  b.  Brown’s  method  is 
equivalent  to  Gaussian  elimination  with  pivoting  about  the  maximum  row  element  of  the  reduced 
matrix  [5],  while  Brent’s  method  is  equivalent  to  factoring  A  into  a  product  of  a  lower  triangular 
matrix  and  an  orthogonal  matrix  [4].  It  can  be  shown,  based  on  [31],  that  there  exists  a  Brown-Brent 
analog  for  any  matrix  decomposition  in  the  linear  case. 

Brown  [5],  [7],  Brown  and  Dennis  [8],  Brent  [4],  and  Gay  [14]  established  local  quadratic  con¬ 
vergence  of  variants  of  the  algorithm,  both  for  analytic  and  finite  difference  derivatives.  To  the 
authors’  knowledge,  there  had  been  no  theoreticaUy  supported  global  extensions  of  Brown-Brent 
methods  until  [1]. 

2.2  IVust-Region  Methods 

Consider  the  following  unconstrained  minimization  problem. 

Problem  UNC: 


minimize  /(x) 

where  /  :  I?”  -+  S  is  continuously  differential.  Given  Xc,  the  current  approximation  to  the  solution, 
a  trust-region  algorithm  for  solving  the  problem  finds  a  trial  step  by  solving  the  following  trust- 
region  subproblem  approximately: 

minimize  /(xc)  -f  V /(xc)^s  +  ^3^ HcS  (1) 

subject  to  ||s(|  <  Sc, 

where  /,  4  €  »,  V/,  3  G  He  =  Hj  G  is  the  Hessian  of  /  or  an  approximation  to  it,  >  0 
is  the  trust-region  radius,  and  ||  •  ||  denotes  the  £3  norm.  The  idea  is  to  accept  the  trial  step  when 
the  quadratic  model  adequately  predicts  the  behavior  of  the  function,  and  to  recompute  the  step 
in  a  smaUer  region  if  it  does  not. 
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The  trust-region  approach  to  the  problem  of  solving  systems  of  nonlinear  equations  is  just  a 
special  case  of  the  approach  to  the  problem  above;  namely,  for  nonlinear  equations,  the  objective 
function  f{x)  is  taken  to  be  ||F(a;)(|2. 

Detailed  treatment  of  the  trust-region  approach  to  unconstrained  optimization  and  nonlinear 
equations  can  be  found  in  Dennis  and  Schnabel  (18],  Sorensen  [30],  More  [25],  More  and  Sorensen 
[26],  Powell  [21],  and  Shultz,  Schnabel  and  Byrd[28]. 

For  the  equality  constrained  optimization  problem,  the  successive  quadratic  programming  (SQP) 
algorithm  is  used  commonly.  Its  step  is  found  by  computing  a  minimum  of  the  quadratic  model 
of  the  Lagrangian  at  the  current  point,  subject  to  linearized  constraints.  A  trust-region  algo¬ 
rithm  based  on  SQP  adds  the  trust-region  constraint  to  the  subproblem  and  additional  constraints 
designed  to  ensure  that  the  trust-region  constraint  and  the  linearized  constraints  are  consistent. 

2.2.1  Merit  Functions 

In  order  to  evaluate  a  trial  step,  trust-region  algorithms  use  merit  functions,  which  are  functions 
related  to  the  problem  in  such  a  way  that  the  improvement  in  the  merit  function  signifies  progress 
toward  the  solution  of  the  problem. 

For  unconstrained  minimization,  a  natural  choice  for  a  merit  function  is  the  objective  function 
itself.  Let 

=  f(xc)  +  Vf{xcfs  -f-  HcS  (2) 

denote  the  quadratic  model  of  the  merit  function.  We  define  two  related  functions. 

The  actual  reduction  is  defined  as 


aredcisc)  =  /(^c)  -  /(*c  +  -Sc),  (3) 

and  the  predicted  reduction  is  defined  as 

predcisc)  =  <^(0)  -  <f>{Sc)  (4) 

so  that  the  predicted  reduction  in  the  merit  function  is  an  approximation  to  the  actual  reduction 
in  the  merit  function. 

The  standard  way  to  evaluate  the  trial  step  in  trust-region  methods  is  to  consider  the  ratio  of 
the  actual  reduction  to  the  predicted  reduction.  A  value  lower  than  a  small  predetermined  value 
causes  the  step  to  be  rejected.  Otherwise  the  step  is  accepted. 

For  nonlinear  systems  of  equations,  the  norm  of  the  residuals  serves  a^  a  merit  function.  For  the 
constrained  optimization,  the  merit  function  is  some  expression  that  involves  both  the  objective 
function  and  the  constraints. 

We  shall  see  that  conventional  merit  functions  prove  to  be  inadequate  for  multilevel  algorithms. 

2.2.2  lYaction  of  Cauchy  Decrease 

To  assure  global  convergence  of  a  trust-region  algorithm  for  problem  UNC,  the  trial  step  is  required 
to  satisfy  a  fraction  uf  Cauchy  decrease  condition.  This  mild  condition  means  that  the  trial 
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step,  Sc,  must  predict  at  least  a  fraction  of  the  decrease  predicted  by  the  Cauchy  step,  which  is  the 
steepest  descent  step  for  the  model  within  the  trust  region.  We  must  have  for  some  fixed  k  >  0 


4>iSc)  -  ^(0)  <  -  0(0)], 


(5) 


where 

=  -a^^Vfixc)  with 


_ _ 

vj(x7ynf^vj(^ 

pfeu 


if _ _ <  s 

otherwise. 


See  Dennis  and  Schnabel  [18],  pp.  139 — 141,  for  details  on  the  Cauchy  point. 

The  fraction  of  Cauchy  decrease  property  implies  a  weaker  condition  which  has  a  more  conve¬ 
nient  form  and  is  frequently  used  as  a  technical  lemma  in  the  global  convergence  proofs. 


Lemma  2.1  Let  Sc  satisfy  (5).  Then 

0(0)  -  4>isc)  >  ^11  V/(xc)||  min  { '[['  ''  ’  ‘ 

References;  Powell  [21];  More  [25]. 

Either  (5)  or  (6)  is  necessary  to  establish  global  convergence  theoretically. 


2.2.3  Global  Convergence  Results 

Powell’s  global  convergence  theorem  [21]  for  any  unconstrained  minimization  trust-region  algorithm 
serves  as  a  prototype  for  most  trust-region  related  convergence  results. 

Theorem  2.1  Let  /  :  SJ**  — »  3?  6e  continuously  differentiable  and  bounded  below  on  the  level  set 
{x  e  »"|/(x)  <  /(xo)}-  Assume  that  {Hi}  are  uniformly  bounded  above.  Let  {x^}  be  the  sequence 
of  iterates  generated  by  a  trust-region  algorithm  that  satisfies  (5)  or  (6).  Then 

liminf  ||V/(x<)||  =  0. 

I— ►©© 

Detailed  treatment  of  the  unconstrained  minimization  theory  and  practice  can  be  found  in  More 
[25],  More  and  Sorensen  [26],  Sorensen  [30],  and  Shultz,  Schnabel  and  Byrd  [28]. 

2.2.4  Tangent-Space  Methods  for  Constrained  Optimization 

The  multilevel  methods  proposed  here  may  be  viewed  as  a  generalization  of  an  approach  to  nonlinear 
programming  known  as  the  null  space  or  generalized  elimination  approach  (see  Fletcher  [13]). 

Different  authors  refer  to  different  methods  as  “null  space  methods”,  but  the  general  idea  of  a 
nuD  space  method  for  equality  constrained  minimization  is  to  reduce  the  dimension  of  the  problem 
by  first  taking  the  step  intended  to  solve  the  constraint  equations,  and  then  to  minimize  the  model 
of  the  function  restricted  to  the  null  space  of  the  linearized  constraints.  The  resulting  minimization 
problem  is  of  a  lower  dimension  than  the  original  one. 

A  well-known  local  method  of  this  type  is  the  GRG  (Generalized  Reduced  Gradient)  algorithm. 
Details  of  GRG  and  other  null  space  methods  can  be  found  in  Lasdon  [20],  Fletcher  [13],  Avriel  [2], 
and  Gill,  Murray  and  Wright  [27]. 


A  class  of  global  tnist-region  algorithms  that  use  the  same  general  principle  of  reducing  the 
problem’s  dimension  is  known  as  the  class  of  tangent  space  methods.  The  tangent  space  approach 
was  introduced  to  avoid  the  possibility  of  inconsistency  of  the  constrained  trust-region  subproblem. 

Recent  work  on  these  methods  can  be  found  in  Maciel  [22]  and  Dennis,  El-Alem  and  Maciel 
[19].  The  main  feature  of  the  class  is  that  the  trial  step  is  computed  as  a  sum  of  two  substeps,  the 
first  of  which  is  made  toward  the  linearized  constraints  in  the  direction  orthogonal  to  the  null  space 
of  the  constraint  Jacobian,  while  the  second  is  made  to  minimize  the  model  of  the  Lagrangian  in 
the  null  space  of  the  linearized  constraints.  The  function  and  derivative  information  is  computed 
at  a  single  point  Xc- 

The  multilevel  methods  proposed  here  generalize  the  tangent  space  methods  in  the  sense  that 
their  trial  steps  are  sums  of  t  at  two  substeps  but  of  as  many  substeps  as  there  are  constraint 
blocks  together  with  a  substep  on  the  model  of  the  objective  function  with  the  model  information 
computed  at  the  points  resulting  from  taking  the  substeps  one-by-one. 

3  Multilevel  Algorithms  for  Nonlinear  Optimization 

In  this  section  we  present  the  class  of  multilevel  optimization  algorithms  for  the  nonlinear  equality 
constrained  minimization  problem.  Since  the  time  of  its  introduction  in  [1],  the  class  has  undergone 
changes.  In  [1],  the  globalization  and  extension  to  constrained  optimization  only  of  local  Brent’s 
method  was  proposed.  Recent  developments*  have  extended  the  restilts  to  provide  globalization 
and  extension  to  constrained  optimization  of  the  entire  local  Brown-Brent  class. 

3.1  Notation 

Due  to  arbitrary  blocking  of  the  constraints,  the  notation  becomes  cumbersome.  To  ease  the 
reading  effort,  we  omit  the  subscripts  and  superscripts  where  possible.  Here  is  an  explanation  of 
the  notation  conventions. 

Unless  specified  otherwise,  all  norms  are  I2  norms. 

iFrom  here  on,  we  assume  that  the  equality  constraints  of  problem  EQC  are  partitioned  into 
M  blocks  of  arbitrary  size  and  composition.  Let  the  constraints  of  the  first  block  be  numbered 
from  ni  =  1  to  n2  -  1;  the  constraints  of  the  second  block — from  112  to  «3  -  1;  and  so  on,  until  the 
constrsunts  of  the  last  block  are  numbered  from  njvf-i  to  n\t  =  m. 

The  algorithms  will  be  formally  considered  to  have  an  outer  loop,  in  which  we  make  the  decision 
about  the  acceptability  of  the  step,  and  the  inner  loop,  in  which  we  solve  a  sequence  of  minimization 
subproblems.  The  sum  of  the  substeps  produced  as  solutions  of  these  subproblems  yields  the  total 
trial  step.  The  outer  loop  counter  is  t;  the  inner  loop  counter  is  k.  Thus  k  corresponds  to  the 
block  number  of  constraints.  If  the  subscript  k  is  used  with  a  constant,  that  constant  refers  to 
the  properties  of  the  fc-th  block  of  constraints,  independent  of  the  iterates.  Note  that  the  term 
“inner  loop”  is  formal.  The  purpose  of  the  inner  loop  is  to  compute  a  basis  for  the  null  space  of  the 
Jacobian  of  our  constraint  system,  but  step-by-step,  using  information  at  different  points,  instead 
of  the  simultaneous  computation  of,  say,  the  Newton’s  method. 

'Natalia  Alexandrov  and  J.  E.  Dennis,  Jr.  A  class  of  general  trust-region  multilevel  algorithms  for  systems  of 
nonlinear  equations;  Global  convergence  theory.  In  preparation. 
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We  denote  the  sequence  of  points  generated  by  the  outer  loop  of  the  algorithm  by  {ij}  when 
we  consider  the  iterates  as  members  of  the  sequence  in  convergence  analysis.  Otherwise,  we  use  Xc, 
and  a:+  to  denote  the  current,  the  previous,  and  the  next  iterates,  respectively. 

We  denote  the  sequence  of  points  generated  within  each  inner  loop  by  yk,  k  =  +  1, 

when  we  need  not  consider  the  outer  loop  iteration  number.  Most  of  the  time  we  shall  be  discussing 
entities  within  a  single  iteration.  Otherwise,  we  use  subscripts  and  superscripts.  For  example, 
or  yl  denote  the  inner  fc— th  iterate  within  the  i— th  outer  loop.  Note  that  j/o  =  and  j/m+i  =  2;+- 
The  substep  produced  by  solving  the  A;— th  subproblem  of  the  inner  loop  is  denoted  by  Sk,  k  = 
1, . . . ,  M  +  1.  The  sum  si  +  . . .  +  yields  the  total  trial  step  Sc.  Again,  we  use  subscripts,  e.g.. 
Si,  to  denote  the  total  step  as  a  part  of  the  sequence  of  steps  produced  by  the  algorithm. 

We  denote  the  radius  of  the  trust  region  for  subproblem  k,  centered  at  yk-\,  by  k  = 
1, . . .,  M  +  1.  The  radius  of  the  total  trust  region,  centered  at  =  j/o  is  6c.  or 

We  donote  the  projector  onto  the  intersection  of  nuU  spaces  of  VCi(a;), . . . ,  VC'fe(x)  by  Pk. 
Again,  when  we  omit  superscripts,  we  refer  to  the  objects  within  a  single  outer  loop.  For 
example,  Ckivk-i)  refers  to  Ckiyk_i)  Ckivi-x)- 
Additional  notation  will  be  introduced  as  needed. 

3.2  General  Description 

The  general  glass  of  multUevel  algorithms  can  be  described  in  the  foUowing  way.  The  constraint 
system  of  the  problem  is  partitioned  into  M  arbitrary  blocks.  In  practice,  this  block  decomposition 
is  obvious  in  most  cases.  At  the  current  approximation  to  a  solution  of  problem  EQC,  Xc,  we  set 
yo  =  Xc-  The  trial  step  is  computed  as  follows. 

We  find  an  approximate  minimizer,  sj,  of  the  quadratic  Gauss-Newton  model  about  j/o  of  the 
first  block  of  constraints  in  the  trust  region  of  radius  .  The  step  is  required  to  satisfy  a  fraction 
of  Cauchy  decrease  condition  for  this  model  and  a  mild  boundedness  condition  disussed  in  the  next 
subsection.  The  step  is  taken  to  yield  the  point  yx  =  yo  +  sj. 

We  then  find  an  approximate  minimizer  of  the  quadratic  model  of  the  second  block  of  con¬ 
straints,  restricted  to  the  null  space  of  the  Jacobian  of  the  first  block.  This  model  is  built  using  the 
information  at  the  new  point.  It  is  important  to  emphasize  that  all  the  function  and  derivative 
information  for  the  second  block  is  computed  at  the  new  point  yj.  The  next  step,  52,  bounded  by 
its  own  trust-region,  is  obtained  to  satisfy  a  fraction  of  Cauchy  decrease  condition  for  this  restricted 
model  of  the  second  block.  The  step  is  taken  to  yield  the  point  y2. 

The  process  of  computing  steps  that  satisfy  sufficient  predicted  decrease  for  the  restricted  models 
of  progressively  smaller  dimensions  continues.  Again,  the  model  for  each  block  is  built  by 
using  the  function  and  derivative  information  at  the  most  recently  computed  point. 
The  final  step  on  the  constraints,  sm,  is  obtained  to  produce  sufficient  predicted  decrease  in  the 
quadratic  model,  at  y^f-i ,  of  the  last  block  of  constraints,  restricted  to  the  intersection  of  the  null 
spaces  of  the  Jacobians  of  all  previous  blocks. 

When  all  the  constraint  blocks  have  been  processed,  n  —  m  degrees  of  freedom  still  remain. 
The  remaining  variables  are  used  in  building  a  model  of  the  objective  function,  so  that  the  final 
substep,  sm+xj  is  obtained  to  produce  sufficient  predicted  decrease  in  the  quadratic  model  at  ym 
of  the  objective  function,  restricted  to  the  intersection  of  the  null  spaces  of  the  Jacobians  of  all 
constraint  blocks.  The  final  step  is  taken  to  yield  the  next  major  iterate,  i.e.,  the  next  approximation 
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to  a  solution  of  problem  EQC.  Thus,  the  total  trial  step  from  Xc  to  is  the  sum  of  the  substeps 
in  the  inner  sweep,  i.e.,  Sc  =  -si  +  •  •  •  +  ^M+i- 

Unless  the  convergence  criterion  is  met,  the  total  trial  step  is  evaluated,  and  the  algorithm 
returns  to  process  again  the  first  block  of  constraints  in  a  trust  region  determined  by  the  success 
or  failure  of  the  trial  step. 

3.2.1  Computing  the  Substeps 

During  the  constraint  eUmination  stage,  the  substeps  solve  the  following  subproblems: 

minimize  i||C't(yfc_i)  + 

subject  to  s  =  0,  j  =  1,. .  .,A:  -  1, 

and  possibly  an  additional  constraint 

on  the  step  direction, 

and  ||s||2  <  6k, 

for  ib  =  (Note  that  for  =  1  there  is  no  null  space  constraint.)  Then  the  objective 

function  subproblem  is; 

minimize  /(j/m)  +  V/(yM)^s  +  \»^H{yM)s 
subject  to  VCj(yj_i)^s  =  0,j  =  1, . . .,  Af , 
and  possibly  an  additional  constraint 
on  the  step  direction, 
and  ||s||2  <  ^M+i- 

If  there  is  no  additional  constraint  on  the  direction  of  the  step,  the  subproblems  produce  a 
trust-region  generalizaton  of  the  local  Brent  step.  A  constraint  requiring  that  the  step  be  parallel 
to  some  coordinate  hyperplane  would  be  a  generalization  of  the  local  Brown  step.  In  practice,  there 
is  no  explicit  constraint  for  the  generalization  of  the  Brown  step;  rather  it  is  computed  implicitly. 

Let  Qk-i  be  a  matrix  the  columns  of  which  form  a  basis  for  the  intersection  of  the  null  spaces  of 
VCi(yo), . .  .,VCk-i(yk-2}-  A  change  of  variables,  v  =  converts  the  constrained  subprob¬ 

lems  to  unconstrained  ones. 

For  relatively  small  problems,  the  nuU  space  bases  can  be  computed  by  using  the  QR  decom¬ 
position  to  find  the  basis  for  null  space  of  VCj(yo),  and  then  by  updating  the  decomposition  for 
subsequent  subproblems  to  find  a  basis  for  the  null  space  intersections.  For  larger  problems,  the 
QR  decomposition  becomes  prohibitively  expensive.  In  that  case,  reduced  basis  projectors  can  be 
used.  More  details  about  the  null  space  basis  computations  can  be  found,  for  example,  in  [27]. 

There  are  various  methods  for  solving  large-scale  trust  region  subproblems.  We  are  holding 
much  hope  for  the  method  recently  developed  by  D.  Sorensen  of  Rice  University. 

However,  once  the  subproblems  with  null  space  constraints  are  converted  into  unconstrained 
trust-region  subproblems,  the  steps  may  be  chosen  in  any  manner,  as  long  as  they  satisfy  two  mild 
conditions. 

1.  As  mentioned  earlier,  if  there  are  no  additional  constraints  on  the  subproblem  k,  its  solution, 
a  Levenberg-Marquard  step  for  the  reduced  problem,  produces  a  generalization  of  the  Brent 
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step.  That  is,  the  substep  is  orthogonal  to  the  linearized  constraint  hyperplane,  for  all  blocks 
numbered  k  +  1,...,  Af.  However,  we  do  not  require  that  the  substeps  be  orthogonal.  We 
require  that  each  substep  satisfies 


lkll<Afc|K4(j/fc-i)|| 


(7) 


for  some  positive  constant  that  depends  only  on  the  properties  of  that  particular  constraint 
block  but  is  independent  of  the  iteration.  Other  conditions  are  possible  to  assure  global 
convergence.  However,  this  condition,  first  formalized  in  [19],  is  reasonable  in  that  it  is 
enforced  automatically  by  any  algorithms  of  interest  for  computing  Unearly  feasible  points. 
For  instance,  this  is  easily  shown  for  the  extensions  of  both  Brown  and  Brent  steps. 


2.  We  also  require  for  each  substep  to  satisfy  a  fraction  of  Cauchy  decrease  condition  for  the 
particrilar  subproblem  that  substep  solves.  This  is  also  a  very  mild  condition — it  is  satisfied  by 
all  reasonable  methods.  Note  that  we  do  not  place  any  conditions  on  the  totzd  trial  step — only 
on  the  substeps. 

It  is  easy  to  show  that  if  sf  is  an  unconstradned  Brown  or  Brent  substep  (or  any  substep 
out  of  the  local  Brown-Brent  class),  we  can  claim  the  following: 

let  fije  =  Otherwise  let 


Sk  = 


6k  *Sk  ^ 


(S) 


Then  Sk  satisfies  the  fraction  of  Cauchy  decrease  condition  on  subproblem  k.  The  proof  is 
given  in  Alexandrov  and  Dennis^. 

Thus,  we  see  that  simply  truncating  the  unconstrained  Brown  or  Brent  substep  to  the  size 
of  the  trust  region  will  produce  sufficient  predicted  decrease  in  the  models  of  the  constrmnt 
blocks. 


3.2.2  The  Merit  Function  and  Its  Model 

Merit  functions  used  to  evaluate  the  progress  of  single-block  trust-region  algorithms  consist  of  some 
combination  of  the  objective  function  and  the  constraints.  One  common  merit  function  is  the  I2 
penalty  function  /(i)  -f  pllC'(i)||2,  where  p  is  the  penalty  parameter. 

In  the  process  of  the  multilevel  algorithm  development,  it  has  become  apparent  that  conven¬ 
tional  merit  functions  are  inadequate  for  measuring  progress  of  the  multilevel  methods,  because  a 
conventional  merit  function  does  not  take  into  account  the  order  in  which  minimization  proceeds. 

The  difficulty  can  be  summarized  as  follows: 

•  The  result  of  the  A:-th  minimization  subproblem  predicts  decrease  for  the  fc-th  component 
from  point  yk-\  to  point  yk-  It  predicts  no  change  for  all  previous  blocks.  However,  there 
is  no  prediction  at  all  about  how  si  +  . . .  -f-  changes  and  likely  increases  the  norms  of  the 
blocks  numbered  k  +  Neither  does  any  substep,  except  s^+i  predict  the  behavior 

of  the  objective  function. 

^Natalia  Alexandrov  and  J.  E.  Dennis,  Jr.  A  class  of  general  trust-region  multilevel  algorithms  for  systems  of 
nonlinear  equations:  Global  convergence  theory.  In  preparation. 
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This  observation  brought  us  to  the  conclusion  that  the  merit  function  must  take  into  account 
the  multilevel  structure  of  the  scheme.  Consider  the  following  modified  £2  penalty  function; 

V{x\pu...,pm)  =  /(at)  +  pm(||Cm(x)|P 

PM-A\\CM.,ix)f  +  Pm-2(||Cm-2(x)1P  +  . . .  +  P2(11C'2(x)|12  +  Pi  ||C,(X)|1-^)))) 

M  M 

fc=l  j=k 

where  pt  >  I,  A;  =  where  p*  >  1,  A:  =  The  initial  choice  p;t  =  1  is  arbitrary 

and  scale-dependent.  The  only  requirement  is  that  Pfc  >  1.  For  theoretical  purposes,  the  problem 
is  assumed  to  be  well-scaled. 

The  new  merit  function  penalizes  for  the  possible  predicted  increase  in  the  constraint  blocks 
fc, . . . ,  M ,  or  in  the  objective  function  that  may  have  occured  during  inner  loop  iterations  1 , . . . ,  fc  - 1 . 

At  VM+i  =  x+  =  Xc  +  Sc,  we  model  each  ((Cfc(i+)lp  by  ||Cfc(yfc_i )  -I-  VCfc(y*-i  and  so  we 
model  the  merit  function  at  x+  by 

Mc{si,...,3M+i;p\,...,p%f)  -  /{vm) +  y fiyM)^SM+-[  +  ^-•4-i^(J/m)sm+i 

+PM-A\\^M~2{yM-3)  +  VCAf-2(2/M-3)^«M-2lP  +  •  •  • 

+/>5(I|C2(i/i)  +  yC2{yxfs2f  -h  p?|lC,(yo)  +  VC,(yof  si||*))) 

=  /(pm)  + 

M  M 

+  EdI  PaOllCfcCyfc-i)  +  VCfc(yfc_if  Sfcip. 

fc=:l  j=fc 

We  define  the  actual  reduction  as  the  difference  between  the  merit  function  values  at  Xc  and 
i+,  and  we  define  the  predicted  reduction  as  the  difference  between  the  value  of  the  merit  function 
at  Xc  and  the  value  of  the  model  at  x+. 

3.2.3  Updating  the  Penalty  Parameters 

This  penalty  parameter  updating  scheme  for  multilevel  methods  generalizes  the  scheme  proposed 
in  El-Alem  [10],  [11].  It  ensures  that  our  merit  function  has  an  essential  property,  namely,  that 
unless  an  iterate  is  optimal,  the  predicted  reduction  should  always  be  positive.  We  use  the  following 
procedure: 

Algorithm  3.1  Penalty  Parameter  Updating  Algorithm  (Done  on  completion  of  each  inner 
sweep  of  minimization  problems.) 

Denote  the  set  {sj , . . . , s*}  by  Sk  and  denote  the  ^et  {pi .,pk}  by  p^. 

At  the  beginning  of  a  multilevel  algorithm,  set  pj"  =  . . .  =  p^  =  I  and  choose  l3  €  (0, 1). 

1.  Compute  Cprcdi(si)  =  ||C,(yo)l|2  -  llC,(yo)  -h  VCiCyor^ijl^. 

2.  Dojb=l,M 
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Update  pk- 
Compute 

Cpredk+i{Sk+i-,Pk.i,Pk)  = 

[||Cfc+i(2to)||2  -  ||Cfe+i(3/fc)  +  VCfc+,(2/fcrs,+,||2 

+PkCpredkiSk-,pi.i)- 
if  Cpredk+i  {Sk+y,  ,Pk)> 

^CpredkiSk; Pk-i)  then 

Pk  =  Pk- 

CpTedk+i{Sk+\;pi_^,pl)  = 

Cpredk+iiSk+i ;  p^_i , Pk  )• 

else 

Pk  =  Pk  + 

where  /ft  = 

Compute  Cpredk+i  {Sk+i ;  pUi  ,  Pk)- 

end  if 
end  Do 
3.  Update  pM. 

Compute 

pred{SM',PM-2^PM-i)  = 

[f{yo)  -  +  plfCpredMiSM^PM-i)- 

ifpred(5M;PM-2»PM-i)  ^ 

^CpredM{SM\pM-\) 

Pm  -  Pm- 

pred{SM;PM-2^PM-i)  =  P^«‘^(*5'm;Pm-2>Pa/-i)- 

else 

Pm  =  PM  + 

Compute  pred{SM]ppM-2^PM-i)- 

end  if 
End 

Note  that  without  updating  the  penalty  parameters  we  can  be  assured  of  the  positive  predicted 
reduction  from  Xc  only  for  the  first  block  of  constraints,  i.e.,  only  C7predi(si)  is  definitely  positive 
without  additional  considerations.  To  ensure  that  Cpred2('»i»52)Pi)  is  positive,  we  may  have  to 
increase  pi.  Now  that  Cpred2{si,S2',pi)  is  positive,  we  use  it  to  ensure  that  the  next  partial 
predicted  reduction  is  positive,  and  so  on.  So,  for  each  each  substep  Sk,  the  predicted  reduction 
accumulated  by  the  step  sj  +  . . .  +  s*  is  at  least  a  fraction  of  the  predicted  decrease  accumulated 
by  the  step  si  +  . . .  +  s*_] . 

Thus  the  predicted  reduction  of  the  first  block  is  the  most  heavily  penalized  one. 

It  should  be  emphasized  that  the  step  computation  is  completely  independent  of  the  penalty 
parameter  computation. 
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3.2.4  Step  Evaluation  and  Trust-Region  Radii  Updating 

Although  there  are  various  schemes  of  evaluating  the  trial  step  and  updating  the  trust  region  radii, 
for  the  sake  of  simplicity  in  discussion,  we  adopt  the  following  strategy; 

•  The  total  trial  step  is  evaluated  outside  the  inner  loop. 

•  All  individual  trust  region  radii  are  equal  and  are  updated  simultaneously  by  the  same  factor. 

Other  strategies  for  practical  implementations  are  discussed  in  Alexandrov^.  We  would  like  to 
emphasize  that  the  simultaneous  expansion  or  con ti  action  of  the  trust  region  radii  is  not  a  technical 
requirement. 

The  algorithm  for  evaluating  the  step  and  updating  the  trust  region  radii  follows. 

Algorithm  3.2  Step  Evaluation  /  Trust  Region  Update 

Given  >  0,  fc  =  1, . . . ,  M  (or  k  =  1, . . . ,  M  +  1  for  optimization),  ^moi  >  0,  Smin  >  0, 0  <  t/i  < 
%  <  1,01  G  (0,1],  02  >  Ijic  G  I?",  ared,  pred. 

Compute  r  = 

if  r  <  then  (step  not  accepted) 

else  if  r  >  72  then  (step  accepted) 

6l(  =  min(^niaz,  02 

Ic  = 

else  (  step  accepted) 

S/t  —  max{^,nVn, 

Xc  =  35+. 

end  if 

We  note  that  if  the  step  is  not  accepted,  the  trust  region  radii  are  decreased  withoutany  safe¬ 
guard.  However,  if  the  step  is  accepted,  the  next  trust  region  radius  is  set  to  be  no  smaller  than  a 
predetermined  positive  value  ^m«n-  This  strategy  is  extremely  important  in  the  global  convergence 
theory.  It  ensures  that  the  trust  region  radius  is  bounded  away  from  zero  and  hence  that  the 
penalty  parameters  are  bounded  from  above.  This  technique  was  introduced  in  [16]. 

3.2.5  The  Stopping  Criteria 

We  use  the  first  order  necessary  conditions  for  problem  EQC  to  terminate  the  algorithm  and  require 
that 


l|C'i(yo)|| 

< 

(tol 

(9) 

((C2(yi)|| 

< 

ftol 

IIC'Af  (yw-i  )|| 

< 

(tot 

< 

^tol 

^Natalia  Alexandrov.  On  implementation  of  multilevel  algorithms  for  nonlinear  equations  and  equality  constrained 
optimization.  In  preparation. 
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hold  simultaneously. 

Since 

NI|  =  0(||C*(y;t-,)||), 

if  ||C'fc(yfc-i)||  is  small,  ||sfc|j  will  be  small  and  the  inner  loop  iterates  wiU  be  close  to  each  other, 
and  in  the  limit  we  can  show  ([1])  that  at  least  a  subsequence  of  the  generated  sequence  of  the 
outer  loop  iterates  will  converge  to  a  stationary  point  of  problem  EQC. 

The  tolerance  parameters  €toi  need  not  be  the  same,  but  for  convenience,  they  are  taken  to  be 
the  same  throughout  the  convergence  analysis. 

The  reason  for  requiring  such  a  stopping  criterion  is  theoretical  and  practical.  The  conventional 
test  for  the  entire  norm  of  the  constraint  residual  being  close  to  0  does  not  differentiate  between  the 
individual  (|C'fc(yfc_i)(|.  It  is  essential  for  the  convergence  proof  to  determine  how  close  to  feasibility 
an  iterate  must  be  in  order  for  the  penalty  parameters  not  to  be  increased.  This  is  a  measure  of 
feasibility  versus  optimality.  The  conventional  stopping  criterion  allows  only  the  total  feasibility  to 
be  measured  and  thus  to  determine  when  pm  does  not  have  to  be  increased.  But  even  if  pM  is  not 
increased,  have  to  be  increased  because  of  the  relative  sizes  of  the  component 

block  norms.  The  conventional  criterion  does  not  allow  us  to  measure  relative  feasibility  of  one 
block  of  constraints  with  respect  to  the  others. 

In  practice,  we  do  not  wish  to  evaluate  the  residuals  at  the  same  point  just  for  the  sake  of  the 
stopping  criterion. 

Other  stopping  criteria  are  possible,  but  the  one  above  is  the  most  natural  one. 

3.2.6  The  Statement  of  the  Algorithm 

The  formal  description  of  the  algorithm  follows. 

Let  the  constraints  be  partitioned  into  M  blocks. 

Algorithm  3.3  Multilevel  Algorithm  for  Equality  Constrained  Optimization 

Given  6k>0,k=  I,.  ..,M,6max  >  >  0,0  <  m  <  V2  <  l»ai  €  02  >  l,Xc  €  3?”. 

Outer  Loop:  Do  until  convergence: 
yb  =  Xc. 

Compute  the  trial  step. 

Inner  Loop:  Do  I;  =  l,Af 
If  yjb-i  is  not  feasible  then 

Compute  Sk  that  satisfies  a  fraction  of  Cauchy  decrease 
condition  on  5l|C'fe(yfc_i)  +  ^Ckivk-iMll  restricted  to 
the  intersection  of  the  null  spaces  of  VCj(yj^i)^s  =  0,j  =  1, . .  .,k  -  1, 
and  ||s/b||  <  Ajt||C;t(y;b-i)  (satisfied  automatically). 

y*  =  yjk-i  +  «*• 

End  if 

End  Inner  Loop 

Compute  SAf+i  to  satisfy  the  fraction  of  Cauchy  decrease 
condition  on  the  subproblem:  minimize  restricted  to 
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the  intersection  of  the  null  spaces  of  )s  =  0,  ji  =  1, . . . ,  M, 

and  ||s|la  <  8m. 

Vm+i  =  Vm  +  «M+i. 

®+  =  9M+1- 

The  trial  step  is:  Sc  =  Si  +  . . .  + 

Update  the  penalty  parameters 

Evaluate  the  step  and  update  ihe  trust  region  radius 
If  the  step  is  accepted,  set  Xc  =  x+. 

End  Outer  Loop 

We  should  note  that  there  is  an  option  to  eliminate  only  a  subset  of  constraints  via  the  described 
procedure.  In  this  case,  the  rest  of  the  constraints  and  the  objective  function  would  be  restricted  to 
the  intersection  of  the  null  spaces  of  the  Jacobians  of  the  processed  constraints,  and  the  resulting 
reduced  optimization  problem  would  be  solved  by  a  chosen  method.  The  discussion  of  this  approach 
is  left  for  later  work. 

4  Global  Convergence  Results 

In  this  section  we  give  a  summary  of  the  globad  convergence  theory  for  multilevel  algorithms. 

4.1  Basic  Ingredients  of  a  Global  Convergence  Proof 

Our  proof  contains  the  general  ingredients  of  a  global  convergence  analysis  for  a  trust-region 
method.  The  first  three  are  requires  for  a  typical  analysis  of  an  unconstrained  minimization  algo¬ 
rithm. 

1.  The  trial  step  must  be  shown  to  satisfy  a  sufficient  predicted  decrease  condition,  usually  the 
FCD  condition.  Our  algorithm  assumes  that  the  substeps  satisfy  the  PCD  condition  on  the 
subproblems.  It  remmns  for  us  to  show  that  the  total  step  from  Xc  to  satisfies  a  suitable 
decrease  condition. 

2.  The  difference  between  the  actual  and  predicted  reduction  must  be  bounded  above  by  at  least 
a  constant  mtiltiple  of  the  square  of  the  total  step  norm  plus  multiples  of  higher  powers  of 
the  step  norm.  This  is  easily  shown  multOevel  algorithms. 

3.  The  algorithm  must  be  shown  to  be  well-defined,  i.e.,  we  must  prove  that  the  ratio  of  the 
actual  reduction  to  predicted  reduction  can  be  made  greater  than  a  given  ifi  6  (0, 1)  after  a 
finite  number  of  trial  step  computations.  Given  2,  it  is  easy  to  show  that  as  the  trust  region 
radius  approaches  zero,  the  ratio  of  the  actual  reduction  to  predicted  reduction  approaches 
one.  For  the  algorithm  to  be  well-defined  we  must  show  that  the  ratio  of  the  predicted  to 
actual  reductKMB  approaches  one  faster  than  the  trust  region  radius  goes  to  zero.  This  is  easily 
estaUisbed  for  our  algorithm. 

An  algorithm  for  constrained  optimization  that  uses  penalty  parameters  in  its  merit  function 
requires  the  fourth  ingredient. 
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4.  The  penalty  parameter  in  the  merit  function  must  be  shown  to  be  bounded.  The  technique  is 
to  prove  that  the  product  of  the  penalty  parameter  and  the  trust  region  radius  is  bounded  by 
a  constant  independent  of  the  iterates.  The  sequence  of  the  trust  region  radii  is  then  shown 
to  be  boun(  ed  away  from  zero.  Here  a  crucial  role  is  played  by  the  trust  region  updating 
technique  introduced  in  [16]:  after  a  successful  iter  ation  and  before  starting  the  next  iteration, 
the  trust  region  radius  is  set  to  be  no  smaller  than  a  pre-defined  value.  This  way  of  updating 
allows  us  to  prove  that  the  sequence  of  penalty  parameters  is  bounded  from  above. 

The  method  for  updating  the  penalty  parameters  ensures  that  the  sequence  of  penalty  param¬ 
eters  is  nondecreasing  which,  together  with  its  boundedness,  allows  us  to  conclude  that  the 
penalty  parameter  sequence  converges  and,  moreover,  remains  constant  after  a  finite  number 
of  increases.  This  fact  is  used  in  the  global  convergence  theorem. 

4.2  Assumptions 

We  make  the  following  assumptions  on  the  problem  and  the  sequence  of  steps  and  iterates: 

•  f,C  are  at  least  twice  continuously  differentiable. 

•  The  gradient  of  the  constraints  has  full  rank.  This  is  a  strong  assumption,  but  it  is  a  standard 
practice  to  require  it  for  the  sake  of  convergence  proofs.  Practical  experience  suggests  that 
the  breakdown  of  this  assumption  does  not  necessarily  diminish  the  efficacy  of  our  algorithm. 
Not  assuming  full  rank  would  allows  us  to  prove  a  slightly  weaker  convergence  result. 

•  /(»),  V/(x),  VV(x).  Hu,  C(x),  VC(i),  Vft(x),  =  1 . m, 

(l/tiVaCxjnPj-.VCiW)}-',*  =  1 . M,  are  aU  uniformly  bounded  in  normfor  all  x 

in  the  domain  of  interest. 

Since  we  require  that  the  Hessian  of  the  objective  function  be  only  bounded,  we  can  even  take 
it  to  be  0.  Of  course,  such  an  approximation  would  lower  the  effectiveness  of  the  algorithm. 

4.3  Summary  of  the  Proof 

In  this  subsection  we  provide  an  overview  of  steps  in  the  convergence  proof.  The  details  can  be 
found  in  [1]  and  Alexandrov  and  Dennis 

•  We  show  that  under  our  assumptions,  the  norm  of  any  intermediate  sum  of  the  substeps  is 
bounded  by  a  costant  times  the  norm  of  the  total  trial  step. 

•  Several  technical  results  provide  workable  expressions  of  the  FCD  (fraction  of  Cauchy  de¬ 
crease)  condition  similar  to  the  one  used  for  unconstrained  optimization. 

•  A  standard  result  provides  and  upper  bound  on  the  error  between  actual  reduction  and 
predicted  reduction. 

*The  global  convergence  theory  for  algorithms  with  nonmonotone  penalty  parameters  has  been  investigated  by 
Mahmond  EbAlem  [12]. 

^NataUa  Alexandrov  and  J.  E.  Dennis,  Jr.  A  class  of  general  trnst-regpon  multilevel  algorithms  for  systems  of 
nonlinear  equations:  Global  convergence  theory.  In  preparation. 
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•  By  virtue  of  the  penalty  parameter  updating  scheme,  the  multilevel  algorithms  have  the 
property  that  if  an  iterate  is  feasible,  the  penalty  parameters  are  not  increased.  We  show 
that  if  the  iterates  are  sufficiently  close  to  feasibility,  the  penalty  parameters  are  not  increased 
either.  This  result  is  crucial  to  the  proof  of  convergence,  giving  a  sufficient  condition  for  the 
penalty  parameters  not  to  be  increased. 

•  Next  we  establish  an  upper  bound  on  the  product  of  the  penalty  parameters  with  the  trust 
region  radii.  This  result  allows  us  to  conclude  that  the  radii  are  bounded  below  if  the  penalty 
parameters  increase.  The  penalty  parameter  sequences  are  shown  to  be  nondecreasing,  which, 
together  with  their  boundedness  from  above,  allows  us  to  conclude  that  the  penalty  param¬ 
eters  tend  to  a  limit,  and,  moreover,  stay  constant  after  a  finite  number  of  outer  iterations. 
The  limit  is  shown  to  exist,  but  its  explicit  expression  is  not  known. 

•  We  have  shown  that  the  total  trust  region  radius  is  bounded  away  from  zero  if  any  of  the 
penalty  parameters  are  increased.  Now  we  show  that  radius  is  always  bounded  away  from 
zero.  The  trust  region  updating  strategy  ensures  that  is  is  bounded  from  above. 

•  The  next  result  guarantees  that  the  algorithm  is  well  defined,  i.e.,  that  after  a  finite  number 
of  outer  loop  iterations  an  acceptable  step  Sc  with 

ared 
pred  ~ 

will  be  found. 

•  In  the  global  convergence  result,  we  show  that  if  the  objective  function  is  bounded  below,  then 
the  sequence  of  iterates  generated  by  a  multilevel  algorithm  has  a  subsequence  convergent  to 
a  stationary  point  of  the  equality  constrained  minimization  problem. 

•  As  a  corollary,  we  can  now  conclude  that  the  multilevel  algorithm  for  nonlinear  equations  is 
also  globally  convergent. 

5  Discussion  and  Concluding  Remarks 

We  have  described  a  broad  new  class  of  multilevel  algorithms  for  solving  the  nonlinear  equations 
problem  and  the  equality  constrained  optimization  problem.  The  class  can  be  considered  as  a 
globalization  and  an  extension  of  the  local  class  of  algorithms  of  Brown  and  Brent  for  solving 
nonlinear  systems  of  equations. 

The  main  practical  appeal  of  the  multilevel  algorithms  is  that  in  the  case  of  equality  constrained 
optimization,  they  allow  the  user  to  partition  the  constraint  system  arbitrarily,  to  fit  the  application, 
and  to  process  the  blocks  of  constraints  separately.  In  their  finite-difference  derivative  form,  they 
require  fewer  function  evaluations  than  the  Newton’s  method. 

The  multilevel  class  is  characterized  by  requiring  very  mild  conditions  to  be  imposed  on  the 
trial  steps.  All  reasonable  algorithms  satisfy  these  conditions  automatically. 

We  have  established  global  convergence  theory  for  the  entire  class.  The  theory  implies  conver¬ 
gence  of  the  nonlinear  equations  solver,  which,  to  the  author’s  knowledge,  is  the  first  theoretically 
supported  method  for  globalizing  Brown-Brent  methods.  The  global  convergence  theory  was  made 
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possible  by  the  introduction  of  the  new  merit  function  that  takes  into  account  the  order  of  the 
constraint  processing.  The  nested  penalty  parameters  are  updated  by  an  extension  of  the  scheme 
proposed  by  El-Alem  [10]. 

The  algorithms  are  expected  to  be  applicable  to  the  problem  of  the  multidisciplinary  design 
optimization  and  to  serve  as  a  foundation  for  the  study  of  the  general  multilevel  optimization 
problem. 

We  would  like  to  mention  one  more  application.  The  design  of  complex  engineering  systems  is 
by  nature  a  multicriteria  optimization  problem.  The  design  projects  are  distinguished  by  very  large 
numbers  of  variables,  constraints,  and  expensive  analyses.  To  solve  the  problem,  it  is  necessary 
to  break  it  into  disciplines,  each  of  which  produces  its  own  optimal  design.  The  discipline  designs 
are  then  incorporated  into  a  total  design.  The  multilevel  methods  proposed  here  would  allow 
researchers  to  integrate  constraints  obtained  from  different  sources. 

To  solve  the  multicriteria  optimization  problem,  it  is  necessary  to  decide  when  an  iterate  is 
optimal.  One  of  the  approaches  to  optimality  is  the  statement  of  the  multicriteria  problem  as  a 
multilevel  optimization  problem,  i.e.,  the  problem  of  minimizing  a  function  on  a  feasible  set,  which 
is  an  optimal  set  for  another  function,  and  so  on.  In  such  an  approach,  the  user  places  priorities 
on  the  optimization  problems  that  are  to  be  solved  sequentially.  We  believe  that  the  multilevel 
algorithms  proposed  here  will  serve  as  a  beginning  for  a  detailed  study  of  the  general  multilevel 
optimization  problem. 

Directions  of  research  in  progress  include  local  convergence  rates,  implementation,  extensive 
testing  on  applications,  incorporation  of  bound  and  inequality  constraints,  and  extensions  to  general 
nonlinear  bilevel  and  multilevel  optimization. 
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